Overview

Dataset statistics

Number of variables17
Number of observations891221
Missing cells161560
Missing cells (%)1.1%
Duplicate rows78443
Duplicate rows (%)8.8%
Total size in memory115.6 MiB
Average record size in memory136.0 B

Variable types

Categorical10
Numeric7

Warnings

Dataset has 78443 (8.8%) duplicate rowsDuplicates
HEALTH_TYP is highly correlated with NATIONALITAET_KZ and 3 other fieldsHigh correlation
NATIONALITAET_KZ is highly correlated with HEALTH_TYP and 1 other fieldsHigh correlation
PRAEGENDE_JUGENDJAHRE is highly correlated with HEALTH_TYPHigh correlation
SEMIO_SOZ is highly correlated with ANREDE_KZHigh correlation
SHOPPER_TYP is highly correlated with HEALTH_TYP and 1 other fieldsHigh correlation
VERS_TYP is highly correlated with HEALTH_TYP and 2 other fieldsHigh correlation
ANREDE_KZ is highly correlated with SEMIO_SOZHigh correlation
GEBURTSJAHR is highly correlated with PRAEGENDE_JUGENDJAHREHigh correlation
HEALTH_TYP is highly correlated with VERS_TYPHigh correlation
PRAEGENDE_JUGENDJAHRE is highly correlated with GEBURTSJAHR and 1 other fieldsHigh correlation
SEMIO_SOZ is highly correlated with ANREDE_KZHigh correlation
VERS_TYP is highly correlated with HEALTH_TYPHigh correlation
ANREDE_KZ is highly correlated with SEMIO_SOZHigh correlation
ALTERSKATEGORIE_GROB is highly correlated with PRAEGENDE_JUGENDJAHREHigh correlation
GEBURTSJAHR is highly correlated with PRAEGENDE_JUGENDJAHREHigh correlation
HEALTH_TYP is highly correlated with VERS_TYPHigh correlation
PRAEGENDE_JUGENDJAHRE is highly correlated with GEBURTSJAHRHigh correlation
SEMIO_SOZ is highly correlated with ANREDE_KZHigh correlation
VERS_TYP is highly correlated with HEALTH_TYPHigh correlation
ANREDE_KZ is highly correlated with SEMIO_SOZHigh correlation
NATIONALITAET_KZ is highly correlated with VERS_TYP and 3 other fieldsHigh correlation
ALTERSKATEGORIE_GROB is highly correlated with AGER_TYP and 3 other fieldsHigh correlation
AGER_TYP is highly correlated with ALTERSKATEGORIE_GROB and 1 other fieldsHigh correlation
RETOURTYP_BK_S is highly correlated with ALTERSKATEGORIE_GROB and 1 other fieldsHigh correlation
VERS_TYP is highly correlated with NATIONALITAET_KZ and 4 other fieldsHigh correlation
GEBURTSJAHR is highly correlated with PRAEGENDE_JUGENDJAHREHigh correlation
ZABEOTYP is highly correlated with GREEN_AVANTGARDE and 1 other fieldsHigh correlation
GREEN_AVANTGARDE is highly correlated with ZABEOTYP and 1 other fieldsHigh correlation
HEALTH_TYP is highly correlated with NATIONALITAET_KZ and 3 other fieldsHigh correlation
PRAEGENDE_JUGENDJAHRE is highly correlated with NATIONALITAET_KZ and 9 other fieldsHigh correlation
ANREDE_KZ is highly correlated with SEMIO_SOZHigh correlation
SEMIO_SOZ is highly correlated with ANREDE_KZHigh correlation
SHOPPER_TYP is highly correlated with NATIONALITAET_KZ and 4 other fieldsHigh correlation
CJT_GESAMTTYP is highly correlated with VERS_TYPHigh correlation
NATIONALITAET_KZ is highly correlated with HEALTH_TYP and 2 other fieldsHigh correlation
HEALTH_TYP is highly correlated with NATIONALITAET_KZ and 2 other fieldsHigh correlation
VERS_TYP is highly correlated with NATIONALITAET_KZ and 2 other fieldsHigh correlation
SHOPPER_TYP is highly correlated with NATIONALITAET_KZ and 2 other fieldsHigh correlation
SOHO_KZ has 73499 (8.2%) missing values Missing
TITEL_KZ has 73499 (8.2%) missing values Missing
TITEL_KZ is highly skewed (γ1 = 39.64777145) Skewed
GEBURTSJAHR has 392318 (44.0%) zeros Zeros
PRAEGENDE_JUGENDJAHRE has 108164 (12.1%) zeros Zeros
TITEL_KZ has 815562 (91.5%) zeros Zeros

Reproduction

Analysis started2021-05-17 19:50:27.286332
Analysis finished2021-05-17 19:53:06.546268
Duration2 minutes and 39.26 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

AGER_TYP
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size49.9 MiB
-1
677503 
2
98472 
1
79802 
3
 
27104
0
 
8340

Length

Max length2
Median length2
Mean length1.760196405
Min length1

Characters and Unicode

Total characters1568724
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-1
2nd row-1
3rd row-1
4th row2
5th row-1

Common Values

ValueCountFrequency (%)
-1677503
76.0%
298472
 
11.0%
179802
 
9.0%
327104
 
3.0%
08340
 
0.9%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1757305
85.0%
298472
 
11.0%
327104
 
3.0%
08340
 
0.9%

Most occurring characters

ValueCountFrequency (%)
1757305
48.3%
-677503
43.2%
298472
 
6.3%
327104
 
1.7%
08340
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891221
56.8%
Dash Punctuation677503
43.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1757305
85.0%
298472
 
11.0%
327104
 
3.0%
08340
 
0.9%
Dash Punctuation
ValueCountFrequency (%)
-677503
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1568724
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1757305
48.3%
-677503
43.2%
298472
 
6.3%
327104
 
1.7%
08340
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1568724
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1757305
48.3%
-677503
43.2%
298472
 
6.3%
327104
 
1.7%
08340
 
0.5%

CJT_GESAMTTYP
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing4854
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean3.632838316
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q35
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.595021092
Coefficient of variation (CV)0.4390564493
Kurtosis-1.068626824
Mean3.632838316
Median Absolute Deviation (MAD)1
Skewness-0.03888466953
Sum3220028
Variance2.544092284
MonotonicityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
4210963
23.7%
3156449
17.6%
6153915
17.3%
2148795
16.7%
5117376
13.2%
198869
11.1%
(Missing)4854
 
0.5%
ValueCountFrequency (%)
198869
11.1%
2148795
16.7%
3156449
17.6%
4210963
23.7%
5117376
13.2%
6153915
17.3%
ValueCountFrequency (%)
6153915
17.3%
5117376
13.2%
4210963
23.7%
3156449
17.6%
2148795
16.7%
198869
11.1%

GEBURTSJAHR
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct117
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1101.178533
Minimum0
Maximum2017
Zeros392318
Zeros (%)44.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1943
Q31970
95-th percentile1990
Maximum2017
Range2017
Interquartile range (IQR)1970

Descriptive statistics

Standard deviation976.5835513
Coefficient of variation (CV)0.88685306
Kurtosis-1.941480575
Mean1101.178533
Median Absolute Deviation (MAD)46
Skewness-0.240357039
Sum981393433
Variance953715.4326
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0392318
44.0%
196711183
 
1.3%
196511090
 
1.2%
196610933
 
1.2%
197010883
 
1.2%
196410799
 
1.2%
196810792
 
1.2%
196310513
 
1.2%
196910360
 
1.2%
198010275
 
1.2%
Other values (107)402075
45.1%
ValueCountFrequency (%)
0392318
44.0%
19004
 
< 0.1%
19021
 
< 0.1%
19045
 
< 0.1%
19058
 
< 0.1%
19067
 
< 0.1%
19074
 
< 0.1%
19087
 
< 0.1%
19097
 
< 0.1%
191041
 
< 0.1%
ValueCountFrequency (%)
2017593
0.1%
2016167
 
< 0.1%
2015257
 
< 0.1%
2014124
 
< 0.1%
2013380
< 0.1%
2012806
0.1%
2011485
0.1%
2010545
0.1%
2009559
0.1%
2008550
0.1%

GFK_URLAUBERTYP
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing4854
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean7.350304107
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q15
median8
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.525723215
Coefficient of variation (CV)0.4796703869
Kurtosis-1.23285991
Mean7.350304107
Median Absolute Deviation (MAD)3
Skewness-0.2416175217
Sum6515067
Variance12.43072419
MonotonicityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
12138545
15.5%
5120126
13.5%
10109127
12.2%
888042
9.9%
1179740
8.9%
463770
7.2%
960614
6.8%
356007
6.3%
153600
 
6.0%
246702
 
5.2%
Other values (2)70094
7.9%
ValueCountFrequency (%)
153600
6.0%
246702
 
5.2%
356007
6.3%
463770
7.2%
5120126
13.5%
627138
 
3.0%
742956
 
4.8%
888042
9.9%
960614
6.8%
10109127
12.2%
ValueCountFrequency (%)
12138545
15.5%
1179740
8.9%
10109127
12.2%
960614
6.8%
888042
9.9%
742956
 
4.8%
627138
 
3.0%
5120126
13.5%
463770
7.2%
356007
6.3%

GREEN_AVANTGARDE
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size49.3 MiB
0
715996 
1
175225 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters891221
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0715996
80.3%
1175225
 
19.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0715996
80.3%
1175225
 
19.7%

Most occurring characters

ValueCountFrequency (%)
0715996
80.3%
1175225
 
19.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891221
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0715996
80.3%
1175225
 
19.7%

Most occurring scripts

ValueCountFrequency (%)
Common891221
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0715996
80.3%
1175225
 
19.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII891221
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0715996
80.3%
1175225
 
19.7%

HEALTH_TYP
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size49.4 MiB
3
310693 
2
306944 
1
162388 
-1
111196 

Length

Max length2
Median length1
Mean length1.124768155
Min length1

Characters and Unicode

Total characters1002417
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-1
2nd row3
3rd row3
4th row2
5th row3

Common Values

ValueCountFrequency (%)
3310693
34.9%
2306944
34.4%
1162388
18.2%
-1111196
 
12.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
3310693
34.9%
2306944
34.4%
1273584
30.7%

Most occurring characters

ValueCountFrequency (%)
3310693
31.0%
2306944
30.6%
1273584
27.3%
-111196
 
11.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891221
88.9%
Dash Punctuation111196
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3310693
34.9%
2306944
34.4%
1273584
30.7%
Dash Punctuation
ValueCountFrequency (%)
-111196
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1002417
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3310693
31.0%
2306944
30.6%
1273584
27.3%
-111196
 
11.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1002417
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3310693
31.0%
2306944
30.6%
1273584
27.3%
-111196
 
11.1%

NATIONALITAET_KZ
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size49.3 MiB
1
684085 
0
108315 
2
 
65418
3
 
33403

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters891221
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1684085
76.8%
0108315
 
12.2%
265418
 
7.3%
333403
 
3.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1684085
76.8%
0108315
 
12.2%
265418
 
7.3%
333403
 
3.7%

Most occurring characters

ValueCountFrequency (%)
1684085
76.8%
0108315
 
12.2%
265418
 
7.3%
333403
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891221
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1684085
76.8%
0108315
 
12.2%
265418
 
7.3%
333403
 
3.7%

Most occurring scripts

ValueCountFrequency (%)
Common891221
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1684085
76.8%
0108315
 
12.2%
265418
 
7.3%
333403
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII891221
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1684085
76.8%
0108315
 
12.2%
265418
 
7.3%
333403
 
3.7%

PRAEGENDE_JUGENDJAHRE
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.154345555
Minimum0
Maximum15
Zeros108164
Zeros (%)12.1%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum0
5-th percentile0
Q15
median8
Q314
95-th percentile14
Maximum15
Range15
Interquartile range (IQR)9

Descriptive statistics

Standard deviation4.844532197
Coefficient of variation (CV)0.5941043538
Kurtosis-1.11662633
Mean8.154345555
Median Absolute Deviation (MAD)4
Skewness-0.2507805175
Sum7267324
Variance23.4694922
MonotonicityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
14188697
21.2%
8145988
16.4%
0108164
12.1%
586416
9.7%
1085808
9.6%
355195
 
6.2%
1542547
 
4.8%
1135752
 
4.0%
933570
 
3.8%
625652
 
2.9%
Other values (6)83432
9.4%
ValueCountFrequency (%)
0108164
12.1%
121282
 
2.4%
27479
 
0.8%
355195
 
6.2%
420451
 
2.3%
586416
9.7%
625652
 
2.9%
74010
 
0.4%
8145988
16.4%
933570
 
3.8%
ValueCountFrequency (%)
1542547
 
4.8%
14188697
21.2%
135764
 
0.6%
1224446
 
2.7%
1135752
 
4.0%
1085808
9.6%
933570
 
3.8%
8145988
16.4%
74010
 
0.4%
625652
 
2.9%

RETOURTYP_BK_S
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing4854
Missing (%)0.5%
Memory size50.9 MiB
5.0
297993 
3.0
231816 
4.0
131115 
1.0
129712 
2.0
95731 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2659101
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5.0
2nd row1.0
3rd row3.0
4th row2.0
5th row5.0

Common Values

ValueCountFrequency (%)
5.0297993
33.4%
3.0231816
26.0%
4.0131115
14.7%
1.0129712
14.6%
2.095731
 
10.7%
(Missing)4854
 
0.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
5.0297993
33.6%
3.0231816
26.2%
4.0131115
14.8%
1.0129712
14.6%
2.095731
 
10.8%

Most occurring characters

ValueCountFrequency (%)
.886367
33.3%
0886367
33.3%
5297993
 
11.2%
3231816
 
8.7%
4131115
 
4.9%
1129712
 
4.9%
295731
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1772734
66.7%
Other Punctuation886367
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0886367
50.0%
5297993
 
16.8%
3231816
 
13.1%
4131115
 
7.4%
1129712
 
7.3%
295731
 
5.4%
Other Punctuation
ValueCountFrequency (%)
.886367
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2659101
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.886367
33.3%
0886367
33.3%
5297993
 
11.2%
3231816
 
8.7%
4131115
 
4.9%
1129712
 
4.9%
295731
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII2659101
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.886367
33.3%
0886367
33.3%
5297993
 
11.2%
3231816
 
8.7%
4131115
 
4.9%
1129712
 
4.9%
295731
 
3.6%

SEMIO_SOZ
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.945859669
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.946564233
Coefficient of variation (CV)0.4933181603
Kurtosis-1.353534476
Mean3.945859669
Median Absolute Deviation (MAD)2
Skewness0.1789455842
Sum3516633
Variance3.789112312
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2244714
27.5%
6136205
15.3%
5121786
13.7%
3118889
13.3%
7117378
13.2%
490161
 
10.1%
162088
 
7.0%
ValueCountFrequency (%)
162088
 
7.0%
2244714
27.5%
3118889
13.3%
490161
 
10.1%
5121786
13.7%
6136205
15.3%
7117378
13.2%
ValueCountFrequency (%)
7117378
13.2%
6136205
15.3%
5121786
13.7%
490161
 
10.1%
3118889
13.3%
2244714
27.5%
162088
 
7.0%

SHOPPER_TYP
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size49.4 MiB
1
254761 
2
207463 
3
190219 
0
127582 
-1
111196 

Length

Max length2
Median length1
Mean length1.124768155
Min length1

Characters and Unicode

Total characters1002417
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-1
2nd row3
3rd row2
4th row1
5th row2

Common Values

ValueCountFrequency (%)
1254761
28.6%
2207463
23.3%
3190219
21.3%
0127582
14.3%
-1111196
12.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1365957
41.1%
2207463
23.3%
3190219
21.3%
0127582
 
14.3%

Most occurring characters

ValueCountFrequency (%)
1365957
36.5%
2207463
20.7%
3190219
19.0%
0127582
 
12.7%
-111196
 
11.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891221
88.9%
Dash Punctuation111196
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1365957
41.1%
2207463
23.3%
3190219
21.3%
0127582
 
14.3%
Dash Punctuation
ValueCountFrequency (%)
-111196
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1002417
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1365957
36.5%
2207463
20.7%
3190219
19.0%
0127582
 
12.7%
-111196
 
11.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1002417
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1365957
36.5%
2207463
20.7%
3190219
19.0%
0127582
 
12.7%
-111196
 
11.1%

SOHO_KZ
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing73499
Missing (%)8.2%
Memory size49.6 MiB
0.0
810834 
1.0
 
6888

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2453166
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0810834
91.0%
1.06888
 
0.8%
(Missing)73499
 
8.2%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0.0810834
99.2%
1.06888
 
0.8%

Most occurring characters

ValueCountFrequency (%)
01628556
66.4%
.817722
33.3%
16888
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1635444
66.7%
Other Punctuation817722
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01628556
99.6%
16888
 
0.4%
Other Punctuation
ValueCountFrequency (%)
.817722
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2453166
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01628556
66.4%
.817722
33.3%
16888
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2453166
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01628556
66.4%
.817722
33.3%
16888
 
0.3%

TITEL_KZ
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS

Distinct6
Distinct (%)< 0.1%
Missing73499
Missing (%)8.2%
Infinite0
Infinite (%)0.0%
Mean0.003482846248
Minimum0
Maximum5
Zeros815562
Zeros (%)91.5%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.08495716307
Coefficient of variation (CV)24.39302714
Kurtosis1998.880458
Mean0.003482846248
Median Absolute Deviation (MAD)0
Skewness39.64777145
Sum2848
Variance0.007217719557
MonotonicityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0815562
91.5%
11947
 
0.2%
5104
 
< 0.1%
457
 
< 0.1%
349
 
< 0.1%
23
 
< 0.1%
(Missing)73499
 
8.2%
ValueCountFrequency (%)
0815562
91.5%
11947
 
0.2%
23
 
< 0.1%
349
 
< 0.1%
457
 
< 0.1%
5104
 
< 0.1%
ValueCountFrequency (%)
5104
 
< 0.1%
457
 
< 0.1%
349
 
< 0.1%
23
 
< 0.1%
11947
 
0.2%
0815562
91.5%

VERS_TYP
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size49.4 MiB
2
398722 
1
381303 
-1
111196 

Length

Max length2
Median length1
Mean length1.124768155
Min length1

Characters and Unicode

Total characters1002417
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-1
2nd row2
3rd row1
4th row1
5th row2

Common Values

ValueCountFrequency (%)
2398722
44.7%
1381303
42.8%
-1111196
 
12.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1492499
55.3%
2398722
44.7%

Most occurring characters

ValueCountFrequency (%)
1492499
49.1%
2398722
39.8%
-111196
 
11.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891221
88.9%
Dash Punctuation111196
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1492499
55.3%
2398722
44.7%
Dash Punctuation
ValueCountFrequency (%)
-111196
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1002417
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1492499
49.1%
2398722
39.8%
-111196
 
11.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1002417
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1492499
49.1%
2398722
39.8%
-111196
 
11.1%

ZABEOTYP
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.3624376
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median3
Q34
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.352704299
Coefficient of variation (CV)0.4022987071
Kurtosis-0.2449129835
Mean3.3624376
Median Absolute Deviation (MAD)1
Skewness0.02846326419
Sum2996675
Variance1.82980892
MonotonicityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
3364905
40.9%
4210095
23.6%
1123622
 
13.9%
584956
 
9.5%
674473
 
8.4%
233170
 
3.7%
ValueCountFrequency (%)
1123622
 
13.9%
233170
 
3.7%
3364905
40.9%
4210095
23.6%
584956
 
9.5%
674473
 
8.4%
ValueCountFrequency (%)
674473
 
8.4%
584956
 
9.5%
4210095
23.6%
3364905
40.9%
233170
 
3.7%
1123622
 
13.9%

ANREDE_KZ
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size49.3 MiB
2
465305 
1
425916 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters891221
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
2465305
52.2%
1425916
47.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2465305
52.2%
1425916
47.8%

Most occurring characters

ValueCountFrequency (%)
2465305
52.2%
1425916
47.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891221
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2465305
52.2%
1425916
47.8%

Most occurring scripts

ValueCountFrequency (%)
Common891221
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2465305
52.2%
1425916
47.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII891221
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2465305
52.2%
1425916
47.8%

ALTERSKATEGORIE_GROB
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size49.3 MiB
3
358533 
4
228510 
2
158410 
1
142887 
9
 
2881

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters891221
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row3
4th row4
5th row3

Common Values

ValueCountFrequency (%)
3358533
40.2%
4228510
25.6%
2158410
17.8%
1142887
 
16.0%
92881
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
3358533
40.2%
4228510
25.6%
2158410
17.8%
1142887
 
16.0%
92881
 
0.3%

Most occurring characters

ValueCountFrequency (%)
3358533
40.2%
4228510
25.6%
2158410
17.8%
1142887
 
16.0%
92881
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891221
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3358533
40.2%
4228510
25.6%
2158410
17.8%
1142887
 
16.0%
92881
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common891221
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3358533
40.2%
4228510
25.6%
2158410
17.8%
1142887
 
16.0%
92881
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII891221
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3358533
40.2%
4228510
25.6%
2158410
17.8%
1142887
 
16.0%
92881
 
0.3%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

AGER_TYPCJT_GESAMTTYPGEBURTSJAHRGFK_URLAUBERTYPGREEN_AVANTGARDEHEALTH_TYPNATIONALITAET_KZPRAEGENDE_JUGENDJAHRERETOURTYP_BK_SSEMIO_SOZSHOPPER_TYPSOHO_KZTITEL_KZVERS_TYPZABEOTYPANREDE_KZALTERSKATEGORIE_GROB
0-12.0010.00-1005.02-1NaNNaN-1312
1-15.0199610.0031141.0531.00.02521
2-13.0197910.0131153.0420.00.01523
322.019571.002182.0510.00.01324
4-15.019635.003185.0620.00.02413
532.019431.003133.0200.00.02421
6-15.0012.0021104.0210.00.01422
7-13.019649.001185.0700.00.01111
8-13.019743.0131114.0430.00.02613
9-14.0197512.0121154.0230.00.02423

Last rows

AGER_TYPCJT_GESAMTTYPGEBURTSJAHRGFK_URLAUBERTYPGREEN_AVANTGARDEHEALTH_TYPNATIONALITAET_KZPRAEGENDE_JUGENDJAHRERETOURTYP_BK_SSEMIO_SOZSHOPPER_TYPSOHO_KZTITEL_KZVERS_TYPZABEOTYPANREDE_KZALTERSKATEGORIE_GROB
891211-12.019631.003185.0410.00.01613
891212-11.004.001135.0620.00.01314
891213-15.019668.0111112.0210.00.01424
891214-14.0197810.0031144.0530.00.02521
891215-16.0012.0022101.0210.00.01422
891216-15.0197612.0031143.0230.00.01423
891217-14.019701.00-10105.04-10.00.0-1612
891218-14.0197610.0011144.0520.00.01422
891219-13.019949.0011144.0700.00.02511
891220-11.0012.002131.0620.00.01314

Duplicate rows

Most frequently occurring

AGER_TYPCJT_GESAMTTYPGEBURTSJAHRGFK_URLAUBERTYPGREEN_AVANTGARDEHEALTH_TYPNATIONALITAET_KZPRAEGENDE_JUGENDJAHRERETOURTYP_BK_SSEMIO_SOZSHOPPER_TYPSOHO_KZTITEL_KZVERS_TYPZABEOTYPANREDE_KZALTERSKATEGORIE_GROB# duplicates
51132-15.0012.00-1005.04-10.00.0-1413467
51121-15.0012.00-1005.02-10.00.0-1323441
51140-15.0012.00-1005.05-10.00.0-1323389
51130-15.0012.00-1005.04-10.00.0-1313260
50166-15.0010.00-1005.04-10.00.0-1413251
51142-15.0012.00-1005.05-10.00.0-1423236
50154-15.0010.00-1005.02-10.00.0-1323223
51115-15.0012.00-1005.01-10.00.0-1323194
50174-15.0010.00-1005.05-10.00.0-1323191
34942-14.0012.003185.0610.00.02313189